Management of Numerical Simulation Data With Multidimensional Arrays
Scientific applications, such as numerical simulations, generate an ever increasing amount of data thatneeds to be efficiently managed. As most traditional row-store Database Management Systems are nottailored for the analytical workload usually required by such applications, alternative approaches, e. g.,column store and multidimensional arrays, can offer faster querying processing time. In this work, wepropose new techniques for managing the data produced by numerical simulations, such as thosecoming from HeMoLab, by using multidimensional array technologies.We take advantage of multidimensional array that nicely models the dimensions and variables used innumerical simulations. The efficient mapping of the simulation output file onto a multi-dimensional arrayis not simple. A naive solution may lead to sparse arrays, impacting query response time, specially whenthe simulation uses irregular meshes to model its physical domain. We propose novel strategies to solvethese problems by defining an efficient mapping of coordinate values in numerical simulations to evenlydistribute cells in array chunks with the use of equi-depth histograms and space-filling curves.We evaluated our techniques through experiments over real-world data, comparing them with acolumnar and a row-store relational systems. The results indicate that multidimensional arrays andcolumn-stores are much faster than a traditional row-store system for queries issued over a largeramount of simulation data. Also, the results help to identify the scenarios in which using multidimensionalarrays is the most efficient approach, and the ones in which they are outperformed by the relationalcolumn-store approach.
